Diachronic semantic cohesion for topic segmentation of TV broadcast news

نویسندگان

  • Abdessalam Bouchekif
  • Géraldine Damnati
  • Yannick Estève
  • Delphine Charlet
  • Nathalie Camelin
چکیده

This paper proposes a new way to integrate semantic relations into a topic segmentation process by defining the notion of semantic cohesion. In the context of a sliding window based automatic topic segmentation algorithm, semantic relations are incorporated in the similarity measure between adjacent blocs. Additionaly, in the context of TV Brodcast News topic segmentation, we propose a new protocole to gather relevant data for semantic relations computation, showing that a small set of diachronic data can be more relevant for the task than using a large amount of general or asynchronous data. Experiments on a corpus of 86 various French TV Broadcast News shows recorded during one week, in conjunction with text articles collected through the Google News homepage at the same period for semantic relation estimation show significant improvement in topic segmentation performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Enhancing lexical cohesion measure with confidence measures, semantic relations and language model interpolation for multimedia spoken content topic segmentation

Transcript-based topic segmentation of TV programs faces several difficulties arising from transcription errors, from the presence of potentially short segments and from the limited number of word repetitions to enforce lexical cohesion, i.e., lexical relations that exist within a text to provide a certain unity. To overcome these problems, we extend a probabilistic measure of lexical cohesion ...

متن کامل

Speech cohesion for topic segmentation of spoken contents

In this paper, we introduce the notion of speech cohesion for topic segmentation of a spoken content. The aim is to integrate speaker information and lexical information within a single cohesion value. Based on a lexical cohesion system, we propose an approach that directly integrates the speaker distribution when processing the cohesion. A potential boundary is effective if the joint distribut...

متن کامل

The need to create a media block for the convergence of overseas news networks

As a general diplomacy arm of the Islamic Republic of Iran, VoSiMa has extensive activities in international broadcasting of its radio and television programs. These programs are broadcast in different languages, such as English, French, Azeri, Arabic, and ... for regional and transnational audiences. The large volume of the organization's international activities is in the form of news and new...

متن کامل

How Diachronic Text Corpora Affect Context based Retrieval of OOV Proper Names for Audio News

Out-Of-Vocabulary (OOV) words missed by Large Vocabulary Continuous Speech Recognition (LVCSR) systems can be recovered with the help of topic and semantic context of the OOV words captured from a diachronic text corpus. In this paper we investigate how the choice of documents for the diachronic text corpora affects the retrieval of OOV Proper Names (PNs) relevant to an audio document. We first...

متن کامل

Segmentation and Indexation of Broadcast News

This paper describes a topic segmentation and indexation system for broadcast news that is integrated in an alert system for selective dissemination of multimedia information. The goal of this work is to enhance the retrieval and navigation through specific spoken audio segments that have been automatically transcribed, using speech recognition. Our segmentation algorithm is based on simple heu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015